Evaluation of tuberculosis diagnostic test accuracy using Bayesian latent class analysis in the presence of conditional dependence between the diagnostic tests used in a community-based tuberculosis screening study

Diagnostic accuracy studies in pulmonary tuberculosis (PTB) are complicated by the lack of a perfect reference standard. This limitation can be handled using latent class analysis (LCA), assuming independence between diagnostic test results conditional on the true unobserved PTB status. Test results could remain dependent, however, e.g. with diagnostic tests based on a similar biological basis. If ignored, this gives misleading inferences. Our secondary analysis of data collected during the first year (May 2018 –May 2019) of a community-based multi-morbidity screening program conducted in the rural uMkhanyakude district of KwaZulu Natal, South Africa, used Bayesian LCA. Residents of the catchment area, aged ≥15 years and eligible for microbiological testing, were analyzed. Probit regression methods for dependent binary data sequentially regressed each binary test outcome on other observed test results, measured covariates and the true unobserved PTB status. Unknown model parameters were assigned Gaussian priors to evaluate overall PTB prevalence and diagnostic accuracy of 6 tests used to screen for PTB: any TB symptom, radiologist conclusion, Computer Aided Detection for TB version 5 (CAD4TBv5≥53), CAD4TBv6≥53, Xpert Ultra (excluding trace) and culture. Before the application of our proposed model, we evaluated its performance using a previously published childhood pulmonary TB (CPTB) dataset. Standard LCA assuming conditional independence yielded an unrealistic prevalence estimate of 18.6% which was not resolved by accounting for conditional dependence among the true PTB cases only. Allowing, also, for conditional dependence among the true non-PTB cases produced a 1.1% plausible prevalence. After incorporating age, sex, and HIV status in the analysis, we obtained 0.9% (95% CrI: 0.6, 1.3) overall prevalence. Males had higher PTB prevalence compared to females (1.2% vs. 0.8%). Similarly, HIV+ had a higher PTB prevalence compared to HIV- (1.3% vs. 0.8%). The overall sensitivity for Xpert Ultra (excluding trace) and culture were 62.2% (95% CrI: 48.7, 74.4) and 75.9% (95% CrI: 61.9, 89.2), respectively. Any chest X-ray abnormality, CAD4TBv5≥53 and CAD4TBv6≥53 had similar overall sensitivity. Up to 73.3% (95% CrI: 61.4, 83.4) of all true PTB cases did not report TB symptoms. Our flexible modelling approach yields plausible, easy-to-interpret estimates of sensitivity, specificity and PTB prevalence under more realistic assumptions. Failure to fully account for diagnostic test dependence can yield misleading inferences.


19
The model can be extended to include covariates known to affect the diagnostic accuracy and/or prevalence such 20 that for a binary variable , ( ) can be re-expressed as

39
We will only focus on the case where we have data on J diagnostic tests. Extension to incorporate covariates is 27 straight forward. From (4) we have J conditional probability models to fit to determine , = 1,2, … . Since the 28 outcome is Bernoulli distributed and is related to a set of (binary) independent variables we can define a binary 29 regression model as = ( ), where = ( 0 , 1 , ⋯ , ) is a ( + 1) × 1 vector of observed variables 30 (or diagnostic tests) with 0 = 1, is a ( + 1) × 1 vector of unknown parameters to be estimated, and (. ) is 31 a known cumulative distribution function (CDF) linking the probabilities with the linear component [2]. 32 The unknown parameters were assigned ( ) = ( , +1 ) priors, where +1 is an identity matrix of 33 dimension ( + 1) and is a ( + 1) × 1 vector of variances among subjects whose true PTB status is . The      probabilities were computed using probit regression models with priors from Gaussian distribution. The unknown 107 parameters of 1 , 2 and 3 in the model ( | 1 , 2 , 3 ) are assigned priors from N(0,10) ( Table S5). The 108 unknown parameters of 1 and 2 in the models ( 2 | 1 ) and ( 3 | 1 , 2 ) are assigned priors from N(0,10).

112
Given all the diagnostic tests in use are imperfect, the true TB status is unknown. This is the complexity in diagnostic 113 studies without a perfect reference standard. However, as alluded to in the background of the abstract and in the 114 introduction, this problem can be handled using latent class analysis, a statistical methodology that has been in use 115 over the past four decades in many disciplines but scarcely used in the field of infectious diseases, including the field 116 of TB. This method identifies unobserved mutually exclusive subgroups in the population using information from the 117 measured subject characteristics. CAD4TBv6 yielded a sensitivity and specificity of 80% and 83% respectively ( Fig S1). This is no surprise given the 142 distribution of the difference in scores between the two versions of CAD4TB that is centered around zero ( Fig S2).

156
However, it is imperative to understand that the two devices are used to evaluate chest X-ray images of the same 157 person. Therefore, they are paired. From the scatter plot, the positive correlation between version 6 and version 5 158 is not sufficiently strong to result in strong dependencies between the two versions in the model.  Table S7 presents results of four models. The results of Model 1, Model 2 and Model 3 were already presented in 165 the main document but shown here to aid comparison with Model 4. These findings suggest that conditional 166 dependence between TST and radiography among the children without CPTB cannot be ignored.            Analysis that replaces any chest X-ray abnormality with chest X-ray abnormality suggestive of active TB

879
Tables S10 -S14 present the results obtained following the alternative analysis that replaces any chest X-ray 880 abnormality with chest X-ray abnormality suggestive of active TB.